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Abstract 

We  present  a  component-based  method  and  two  global 
methods  for  face  recognition  and  evaluate  them  with  re¬ 
spect  to  robustness  against  pose  changes.  In  the  component 
system  we  first  locate  facial  components,  extract  them  and 
combine  them  into  a  single  feature  vector  which  is  classi¬ 
fied  by  a  Support  Vector  Machine  (SVM).  The  two  global 
systems  recognize  faces  by  classifying  a  single  feature  vec¬ 
tor  consisting  of  the  gray  values  of  the  whole  face  image.  In 
the  first  global  system  we  trained  a  single  SVM  classifier  for 
each  person  in  the  database.  The  second  system  consists  of 
sets  of  viewpoint- specific  SVM  classifiers  and  involves  clus¬ 
tering  during  training.  We  performed  extensive  tests  on  a 
database  which  included  faces  rotated  up  to  about  40  °  in 
depth.  The  component  system  clearly  outperformed  both 
global  systems  on  all  tests. 


1.  Introduction 

Over  the  past  20  years  numerous  face  recognition  pa¬ 
pers  have  been  published  in  the  computer  vision  commu¬ 
nity;  a  survey  can  be  found  in  [4].  The  number  of  real- 
world  applications  (e.g.  surveillance,  secure  access,  hu¬ 
man/computer  interface)  and  the  availability  of  cheap  and 
powerful  hardware  also  lead  to  the  development  of  com¬ 
mercial  face  recognition  systems.  Despite  the  success  of 
some  of  these  systems  in  constrained  scenarios,  the  general 
task  of  face  recognition  still  poses  a  number  of  challenges 
with  respect  to  changes  in  illumination,  facial  expression, 
and  pose. 

In  the  following  we  give  a  brief  overview  on  face  recog¬ 
nition  methods.  Focusing  on  the  aspect  of  pose  invariance 
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we  divide  face  recognition  techniques  into  two  categories: 
i)  global  approach  and  ii)  component-based  approach. 

i)  In  this  category  a  single  feature  vector  that  repre¬ 
sents  the  whole  face  image  is  used  as  input  to  a  classifier. 
Several  classifiers  have  been  proposed  in  the  literature  e.g. 
minimum  distance  classification  in  the  eigenspace  [18,  20], 
Fisher’s  discriminant  analysis  [1],  and  neural  networks  [6]. 
Global  techniques  work  well  for  classifying  frontal  views 
of  faces.  However,  they  are  not  robust  against  pose  changes 
since  global  features  are  highly  sensitive  to  translation  and 
rotation  of  the  face.  To  avoid  this  problem  an  alignment 
stage  can  be  added  before  classifying  the  face.  Aligning  an 
input  face  image  with  a  reference  face  image  requires  com¬ 
puting  correspondences  between  the  two  face  images.  The 
correspondences  are  usually  determined  for  a  small  number 
of  prominent  points  in  the  face  like  the  center  of  the  eye, 
the  nostrils,  or  the  corners  of  the  mouth.  Based  on  these 
correspondences  the  input  face  image  can  be  warped  to  a 
reference  face  image.  In  [12]  an  affine  transformation  is 
computed  to  perform  the  warping.  Active  shape  models  are 
used  in  [10]  to  align  input  faces  with  model  faces.  A  semi 
automatic  alignment  step  in  combination  with  SVM  classi¬ 
fication  was  proposed  in  [9]. 

ii)  An  alternative  to  the  global  approaches  is  to  clas¬ 
sify  local  facial  components.  The  main  idea  of  component- 
based  recognition  is  to  compensate  for  pose  changes  by  al¬ 
lowing  a  fiexible  geometrical  relation  between  the  compo¬ 
nents  in  the  classification  stage.  In  [3]  face  recognition  was 
performed  by  independently  matching  templates  of  three  fa¬ 
cial  regions  (both  eyes,  nose  and  mouth).  The  configuration 
of  the  components  during  classification  was  unconstrained 
since  the  system  did  not  include  a  geometrical  model  of  the 
face.  A  similar  approach  with  an  additional  alignment  stage 
was  proposed  in  [2].  In  [23]  a  geometrical  model  of  a  face 
was  implemented  by  a  2-D  elastic  graph.  The  recognition 
was  based  on  wavelet  coefficients  that  were  computed  on 
the  nodes  of  the  elastic  graph.  In  [14]  a  window  was  shifted 
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over  the  face  image  and  the  DCT  coefficients  computed 
within  the  window  were  fed  into  a  2-D  Hidden  Markov 
Model. 

We  present  two  global  approaches  and  a  component- 
based  approach  to  face  recognition  and  evaluate  their  ro¬ 
bustness  against  pose  changes.  The  first  global  method  con¬ 
sists  of  a  face  detector  which  extracts  the  face  part  from 
an  image  and  propagates  it  to  a  set  of  SVM  classifiers 
that  perform  the  face  recognition.  By  using  a  face  detec¬ 
tor  we  achieve  translation  and  scale  invariance.  In  the  sec¬ 
ond  global  method  we  split  the  images  of  each  person  into 
viewpoint- specific  clusters.  We  then  train  SVM  classifiers 
on  each  single  cluster.  In  contrast  to  the  global  methods,  the 
component  system  uses  a  face  detector  that  detects  and  ex¬ 
tracts  local  components  of  the  face.  The  detector  consists  of 
a  set  of  SVM  classifiers  that  locate  facial  components  and 
a  single  geometrical  classifier  that  checks  if  the  configura¬ 
tion  of  the  components  matches  a  learned  geometrical  face 
model.  The  detected  components  are  extracted  from  the  im¬ 
age,  normalized  in  size  and  fed  into  a  set  of  SVM  classifiers. 

The  outline  of  the  paper  is  as  follows:  Chapter  2  gives  a 
brief  overview  on  SVM  learning  and  on  strategies  for  multi¬ 
class  classification  with  SVMs.  In  Chapter  3  we  describe 
the  two  global  methods  for  face  recognition.  Chapter  4  is 
about  the  component-based  system.  Chapter  5  contains  ex¬ 
perimental  results  and  a  comparison  between  the  global  and 
component  systems.  Chapter  6  concludes  the  paper. 

2.  Support  Vector  Machine  Classification 

We  first  explain  the  basics  of  SVMs  for  binary  classifi¬ 
cation  [21].  Then  we  discuss  how  this  technique  can  be  ex¬ 
tended  to  deal  with  general  multi-class  classification  prob¬ 
lems. 


The  coefficients  oti  and  the  b  in  Eq.  (1)  are  the  solutions 
of  a  quadratic  programming  problem  [21].  Classification  of 
a  new  data  point  x  is  performed  by  computing  the  sign  of 
the  right  side  of  Eq.  (1).  In  the  following  we  will  use 
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to  perform  multi-class  classification.  The  sign  of  d  is  the 
classification  result  for  x,  and  \d\  is  the  distance  from  x  to 
the  hyperplane.  Intuitively,  the  farther  away  a  point  is  from 
the  decision  surface,  i.e.  the  larger  \d\,  the  more  reliable  the 
classification  result. 

The  entire  construction  can  be  extended  to  the  case  of 
nonlinear  separating  surfaces.  Each  point  x  in  the  input 
space  is  mapped  to  a  point  z  =  $  (x)  of  a  higher  dimen¬ 
sional  space,  called  the  feature  space,  where  the  data  are 
separated  by  a  hyperplane.  The  key  property  in  this  con¬ 
struction  is  that  the  mapping  $(•)  is  subject  to  the  condi¬ 
tion  that  the  dot  product  of  two  points  in  the  feature  space 
$(x)  •  $(y)  can  be  rewritten  as  a  kernel  function  iT(x,  y). 
The  decision  surface  has  the  equation: 


i 

/W  =  '^yiaiK{x,Xi)  +b, 

again,  the  coefficients  ai  and  b  are  the  solutions  of  a 
quadratic  programming  problem.  Note  that  /(x)  does  not 
depend  on  the  dimensionality  of  the  feature  space. 

An  important  family  of  kernel  functions  is  the  polyno¬ 
mial  kernel: 

if(x,y)  =  (l+x-y)^ 

where  d  is  the  degree  of  the  polynomial.  In  this  case  the 
components  of  the  mapping  $  (x)  are  all  the  possible  mono¬ 
mials  of  input  components  up  to  the  degree  d. 


2.1.  Binary  Classification 


2.2.  Multi-class  classification 


SVMs  belong  to  the  class  of  maximum  margin  classi¬ 
fiers.  They  perform  pattern  recognition  between  two  classes 
by  finding  a  decision  surface  that  has  maximum  distance  to 
the  closest  points  in  the  training  set  which  are  termed  sup¬ 
port  vectors.  We  start  with  a  training  set  of  points  x^  G  IR^, 
i  =  1,  2, . . . ,  where  each  point  x^  belongs  to  one  of 
two  classes  identified  by  the  label  yi  G  {  —  1,1}.  Assum¬ 
ing  linearly  separable  data\  the  goal  of  maximum  margin 
classification  is  to  separate  the  two  classes  by  a  hyperplane 
such  that  the  distance  to  the  support  vectors  is  maximized. 
This  hyperplane  is  called  the  optimal  separating  hyperplane 
(OSH).  The  OSH  has  the  form: 

i 
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^For  the  non-separable  case  the  reader  is  referred  to  [21]. 


There  are  two  basic  strategies  for  solving  g^-class  prob¬ 
lems  with  SVMs: 

i)  In  the  one-vs-all  approach  q  SVMs  are  trained.  Each  of 
the  SVMs  separates  a  single  class  from  all  remaining  classes 
[5,  17]. 

ii)  In  the  pairwise  approach  q{q  —  l)/2  machines  are 
trained.  Each  SVM  separates  a  pair  of  classes.  The  pairwise 
classifiers  are  arranged  in  trees,  where  each  tree  node  repre¬ 
sents  an  SVM.  A  bottom-up  tree  similar  to  the  elimination 
tree  used  in  tennis  tournaments  was  originally  proposed  in 
[16]  for  recognition  of  3-D  objects  and  was  applied  to  face 
recognition  in  [7] .  A  top-down  tree  structure  has  been  re¬ 
cently  published  in  [15]. 

There  is  no  theoretical  analysis  of  the  two  strategies  with 
respect  to  classification  performance.  Regarding  the  train¬ 
ing  effort,  the  one-vs-all  approach  is  preferable  since  only 


q  SVMs  have  to  be  trained  compared  io  q{q  —  1)/ 2  SVMs 
in  the  pairwise  approach.  The  run-time  complexity  of  the 
two  strategies  is  similar:  The  one-vs-all  approach  requires 
the  evaluation  of  q,  the  pairwise  approach  the  evaluation 
of  q  —  1  SVMs.  Recent  experiments  on  person  recogni¬ 
tion  show  similar  classification  performances  for  the  two 
strategies  [13].  Since  the  number  of  classes  in  face  recogni¬ 
tion  can  be  rather  large  we  opted  for  the  one-vs-all  strategy 
where  the  number  of  SVMs  is  linear  with  the  number  of 
classes. 

3.  Global  Approach 

Both  global  systems  described  in  this  paper  consist  of  a 
face  detection  stage  where  the  face  is  detected  and  extracted 
from  an  input  image  and  a  recognition  stage  where  the  per¬ 
son’s  identity  is  established. 

3.1.  Face  detection 

We  developed  a  face  detector  similar  to  the  one  described 
in  [8].  In  order  to  detect  faces  at  different  scales  we  first 
computed  a  resolution  pyramid  for  the  input  image  and  then 
shifted  a  58  x  58  window  over  each  image  in  the  pyramid. 
We  applied  two  preprocessing  steps  to  the  gray  images  to 
compensate  for  certain  sources  of  image  variations  [19].  A 
best-fit  intensity  plane  was  subtracted  from  the  gray  values 
to  compensate  for  cast  shadows.  Then  histogram  equaliza¬ 
tion  was  applied  to  remove  variations  in  the  image  bright¬ 
ness  and  contrast.  The  resulting  gray  values  were  normal¬ 
ized  to  be  in  a  range  between  0  and  1  and  were  used  as  input 
features  to  a  linear  SVM  classifier.  Some  detection  results 
are  shown  in  Fig.  1. 

The  training  data  for  the  face  detector  were  generated  by 
rendering  seven  textured  3-D  head  models  [22].  The  heads 
were  rotated  between  —30°  and  30°  in  depth  and  illumi¬ 
nated  by  ambient  light  and  a  single  directional  light  point¬ 
ing  towards  the  center  of  the  face.  We  generated  3,590  face 
images  of  size  58  x  58  pixels.  The  negative  training  set 
initially  consisted  of  10,209  58  x  58  non-face  patterns  ran¬ 
domly  extracted  from  502  non-face  images.  We  expanded 
the  training  set  by  bootstrapping  [19]  to  13,655  non-face 
patterns. 

3.2.  Recognition 

We  implemented  two  global  recognition  systems.  Both 
systems  were  based  on  the  one-vs-all  strategy  for  SVM 
multi-class  classification  described  in  the  previous  Chapter. 

The  first  system  had  a  linear  SVM  for  every  person  in  the 
database.  Each  SVM  was  trained  to  distinguish  between  all 
images  of  a  single  person  (labeled  -hi)  and  all  other  images 


Figure  1 .  The  upper  two  rows  are  example  im¬ 
ages  from  our  training  set.  The  lower  two 
rows  show  the  image  parts  extracted  by  the 
SVM  face  detector. 


in  the  training  set  (labeled  —1).  For  both  training  and  test¬ 
ing  we  ran  the  face  detector  on  the  input  image  to  extract 
the  face.  We  re-saled  the  face  image  to  40  x  40  pixels  and 
converted  the  gray  values  into  a  feature  vector^.  Given  a  set 
of  q  people  and  a  set  of  q  SVMs,  each  one  associated  to  one 
person,  the  class  label  ^  of  a  face  pattern  x  is  computed  as 
follows: 

_  f  n  if  dn(x)  -h  t  >  0 
^  (0  if  dn  (x)  -h  t  <  0 

with  dn(x)  =  max  {(ii(x)}^^^  . 

where  di  (x)  is  computed  according  to  Eq.  (2)  for  the  SVM 
trained  to  recognize  person  i.  The  classification  threshold  is 
denoted  as  t.  The  class  label  0  stands  for  rejection. 

Changes  in  the  head  pose  lead  to  strong  variations  in  the 
images  of  a  person’s  face.  These  in-class  variations  com¬ 
plicate  the  recognition  task.  That  is  why  we  developed  a 
second  method  in  which  we  split  the  training  images  of  each 
person  into  clusters  by  a  divisive  cluster  technique  [11].  The 
algorithm  started  with  an  initial  cluster  including  all  face 
images  of  a  person  after  preprocessing.  The  cluster  with 
the  highest  variance  is  split  into  two  by  a  hyperplane.  The 
variance  of  a  cluster  is  calculated  as: 

r  1  iv  >1^ 

0-2  =  min  ||x„  -  \ 

_ m=l  )  n=l 

^We  applied  the  same  preprocessing  steps  to  generate  the  features  as 
for  the  face  detector  described. 


where  N  is  the  number  of  faces  in  the  cluster.  After  the  par¬ 
titioning  has  been  performed,  the  face  with  the  minimum 
distance  to  all  other  faces  in  the  same  cluster  is  chosen  to 
be  the  average  face  of  the  cluster.  Iterative  clustering  stops 
when  a  maximum  number  of  clusters  is  reached^.  The  av¬ 
erage  faces  can  be  arranged  in  a  binary  tree.  Fig.  2  shows 
the  result  of  clustering  applied  to  the  training  images  of  a 
person  in  our  database.  The  nodes  represent  the  average 
faces;  the  leaves  of  the  tree  are  some  example  faces  of  the 
final  clusters.  As  expected  divisive  clustering  performs  a 
viewpoint-specific  grouping  of  faces. 

We  trained  a  linear  SVM  to  distinguish  between  all  im¬ 
ages  in  one  cluster  (labeled  -hi)  and  all  images  of  other  peo¬ 
ple  in  the  training  set  (labeled  —  1)"^.  Classification  was  done 
according  to  Eq.  (3)  with  q  now  being  the  number  of  clus¬ 
ters  of  all  people  in  the  training  set. 


Figure  2.  Binary  tree  of  face  images  generated 
by  divisive  ciustering. 


4.1.  Detection 


We  implemented  a  two-level  component-based  face  de¬ 
tector  which  is  described  in  detail  in  [8].  The  principles 
of  the  system  are  illustrated  in  Fig.  3.  On  the  first  level, 
component  classifiers  independently  detected  facial  com¬ 
ponents.  On  the  second  level,  a  geometrical  configuration 
classifier  performed  the  final  face  detection  by  combining 
the  results  of  the  component  classifiers.  Given  a  58  x  58 
window,  the  maximum  continuous  outputs  of  the  compo¬ 
nent  classifiers  within  rectangular  search  regions  around  the 
expected  positions  of  the  components  were  used  as  inputs 
to  the  geometrical  configuration  classifier.  The  search  re¬ 
gions  have  been  calculated  from  the  mean  and  standard  de¬ 
viation  of  the  components’  locations  in  the  training  images. 
We  also  provided  the  geometrical  classifier  with  the  precise 
positions  of  the  detected  components  relative  to  the  upper 
left  comer  of  the  58  x  58  window.  The  14  facial  compo¬ 
nents  used  in  the  detection  system  are  shown  in  Fig.  4  (a). 
The  shapes  and  positions  of  the  components  have  been  auto¬ 
matically  determined  from  the  training  data  in  order  to  pro¬ 
vide  maximum  discrimination  between  face  and  non-face 
images;  see  [8]  for  details  about  the  algorithm.  The  training 
set  was  the  same  as  for  the  whole  face  detector  described  in 
the  previous  Chapter. 


First  Level: 
Component 
Classifiers 


Second  Level: 
Detection  of 
Configuration  of 
Components 


Output  of  Output  of  Output  of 
Eye  Classifier  Nose  Classifier  Mouth  Classifier 


4.  Component-based  Approach 

The  global  approach  is  highly  sensitive  to  image  vari¬ 
ations  caused  by  changes  in  the  pose  of  the  face.  The 
component-based  approach  avoids  this  problem  by  indepen¬ 
dently  detecting  parts  of  the  face.  For  small  rotations,  the 
changes  in  the  components  are  relatively  small  compared 
to  the  changes  in  the  whole  face  pattern.  Changes  in  the 
2-D  locations  of  the  components  due  to  pose  changes  are 
accounted  for  by  a  learned,  flexible  face  model. 

^In  our  experiments  we  divided  the  face  images  of  a  person  into  four 
clusters. 

■'This  is  not  exactly  a  one-vs-all  classifier  since  images  of  the  same 
person  but  from  different  clusters  were  omitted. 


Figure  3.  System  overview  of  the  component- 
based  face  detector  using  four  components. 
On  the  first  ievel,  windows  of  the  size  of  the 
components  (soiid  iined  boxes)  are  shifted 
over  the  face  image  and  ciassif  ied  by  the  com¬ 
ponent  ciassifiers.  On  the  second  levei,  the 
maximum  outputs  of  the  component  ciassi¬ 
fiers  within  predefined  search  regions  (dotted 
iined  boxes)  and  the  positions  of  the  detected 
components  are  fed  into  the  geometrical  con¬ 
figuration  ciassifier. 


5.  Experiments 


Figure  4.  (a)  shows  the  14  components  of  our 
face  detector.  The  centers  of  the  components 
are  marked  by  a  white  cross.  The  10  compo¬ 
nents  that  were  used  for  face  recognition  are 
shown  in  (b). 


4.2.  Recognition 

To  train  the  face  recognizer  we  first  ran  the  component- 
based  detector  over  each  image  in  the  training  set  and  ex¬ 
tracted  the  components.  From  the  14  original  we  kept  10  for 
face  recognition,  removing  those  that  either  contained  few 
gray  value  structures  (e.g.  cheeks)  or  strongly  overlapped 
with  other  components.  The  10  selected  components  are 
shown  in  Fig.  4  (b).  Examples  of  the  component-based  face 
detector  applied  to  images  of  the  training  set  are  shown  in 
Fig.  5.  To  generate  the  input  to  our  face  recognition  clas¬ 
sifier  we  normalized  each  of  the  components  in  size  and 
combined  their  gray  values  into  a  single  feature  vector^.  As 
for  the  first  global  system  we  used  a  one-vs-all  approach 
with  a  linear  SVM  for  every  person  in  the  database.  The 
classification  result  was  determined  according  to  Eq.  (3). 


Figure  5.  Examples  of  component-based  face 
detection.  Shown  are  face  parts  covered  by 
the  10  components  that  were  used  for  face 
recognition. 


^Before  extracting  the  components  we  applied  the  same  preprocessing 
steps  to  the  detected  40  x  40  face  image  as  in  the  global  systems. 


The  training  data  for  the  face  recognition  system  were 
recorded  with  a  digital  video  camera  at  a  frame  rate  of  about 
5  Hz.  The  training  set  consisted  of  8,593  gray  face  images 
of  five  subjects  from  which  1,383  were  frontal  views.  The 
resolution  of  the  face  images  ranged  between  80  x  80  and 
130  X  130  pixels  with  rotations  in  azimuth  up  to  about  ±40  ° . 
The  test  set  was  recorded  with  the  same  camera  but  on  a 
separate  day  and  under  different  illumination  and  with  dif¬ 
ferent  background.  The  set  included  974  images  of  all  five 
subjects  in  our  database.  The  rotations  in  depth  was  again 
up  to  about  ±40°. 

Two  experiments  were  carried  out.  In  the  first  experi¬ 
ment  we  trained  on  all  8,593  rotated  and  frontal  face  im¬ 
ages  in  the  training  set  and  tested  on  the  whole  test  set. 
This  experiment  contained  four  different  tests:  Global  ap¬ 
proach  using  one  linear  SVM  classifier  for  each  person,  us¬ 
ing  one  linear  SVM  classifier  for  each  cluster,  using  one 
second  degree  polynomial  SVM  classifier  for  each  person, 
and  component-based  approach  using  one  linear  SVM  clas¬ 
sifier  for  each  person. 


Component  vs.  Whole  Face 

(Training:  5  people,  8,593  frontal  and  rotated.  Test:  5  people,  974  images,  frontal  and  rotated) 


Figure  6.  ROC  curves  when  trained  and  tested 
on  frontal  and  rotated  faces. 


In  the  second  experiment  we  trained  only  on  the  1,383 
frontal  face  images  in  the  training  set  but  tested  on  the 
whole  test  set.  This  experiment  contained  three  different 
tests:  Global  approach  using  one  linear  SVM  classifier  for 
each  person,  using  one  linear  SVM  classifier  for  each  clus¬ 
ter,  and  component-based  approach  using  one  linear  SVM 
classifier  for  each  person. 

The  ROC  curves  of  these  two  experiments  are  shown  in 
Fig.  6  and  Fig.  7,  respectively.  Each  point  on  the  ROC  curve 
corresponds  to  a  different  value  of  the  classification  thresh¬ 
old  t  from  Eq.  (3).  At  the  end  points  of  the  ROC  curves 


Component  vs.  Whole  Face 

(Training:  5  peopie,  1,383  frontai,  Test:  5  peopie,  974  images,  frontai  an  rotated) 


Figure  7.  ROC  curves  when  trained  on  frontai 
faces  and  tested  on  frontai  and  rotated  faces. 


the  rejection  rate  is  0.  Some  results  of  the  component-based 
recognition  system  are  shown  in  Fig  8. 

There  are  three  interesting  observations: 

•  In  both  experiments  the  component  system  clearly  out¬ 
performed  the  global  systems.  This  although  the  face 
classifier  itself  (5  linear  SVMs)  was  less  powerful  than 
the  classifiers  used  in  the  global  methods  (5  non-linear 
SVMs  in  the  global  method  without  clustering,  and  20 
linear  SVMs  in  the  method  with  clustering). 

•  Involving  clustering  lead  to  a  significant  improvement 
of  the  global  method  when  the  training  set  included 
rotated  faces.  This  is  because  clustering  generates 
viewpoint-specific  clusters  that  have  smaller  in-class 
variations  than  the  whole  set  of  images  of  a  person. 
The  global  method  with  clustering  and  linear  SVMs 
was  also  superior  to  the  global  system  without  clus¬ 
tering  and  a  non-linear  SVM  (see  Fig.  6).  This  shows 
that  a  combination  of  weak  classifiers  trained  on  prop¬ 
erly  chosen  subsets  of  the  data  can  outperform  a  single, 
more  powerful  classifier  trained  on  the  whole  data. 

•  Adding  rotated  faces  to  the  training  set  improves  the 
results  of  the  global  method  with  clustering  and  the 
component  method.  Surprisingly,  the  results  for  the 
global  method  without  clustering  got  worse.  This  indi¬ 
cates  that  the  problem  of  classifying  faces  of  one  per¬ 
son  over  a  large  range  of  views  is  too  complex  for  a 
linear  classifier.  Indeed,  the  performance  significantly 
improved  when  using  non-linear  SVMs  with  second- 
degree  polynomial  kernel. 


Figure  8.  Examples  of  component-based  face 
recognition.  The  first  3  rows  and  the  first  im¬ 
age  in  the  last  row  show  correct  identification. 
The  last  two  images  in  the  bottom  row  show 
misclassifications  due  to  strong  rotation  and 
facial  expression. 


6.  Conclusion 

We  presented  a  component-based  technique  and  two 
global  techniques  for  face  recognition  and  evaluated 
their  performance  with  respect  to  robustness  against  pose 
changes.  The  component-based  system  detected  and  ex¬ 
tracted  a  set  of  10  facial  components  and  arranged  them  in  a 
single  feature  vector  that  was  classified  by  linear  SVMs.  In 
both  global  systems  we  detected  the  whole  face,  extracted 
it  from  the  image  and  used  it  as  input  to  the  classifiers.  The 
first  global  system  consisted  of  a  single  SVM  for  each  per¬ 
son  in  the  database.  In  the  second  system  we  clustered  the 
database  of  each  person  and  trained  a  set  of  view- specific 
SVM  classifiers. 

We  tested  the  systems  on  a  database  which  included 
faces  rotated  in  depth  up  to  about  40  ° .  In  all  experiments 
the  component-based  system  outperformed  the  global 
systems  even  though  we  used  more  powerful  classifiers  (i.e. 
non-linear  instead  of  linear  SVMs)  for  the  global  system. 
This  shows  that  using  facial  components  instead  of  the 
whole  face  pattern  as  input  features  significantly  simplifies 
the  task  of  face  recognition. 
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